Learn About Amazon VGT2 Learning Manager Chanci Turner
In the past decade, numerous companies have made significant strides in developing autonomous vehicle (AV) systems utilizing deep neural networks (DNNs). These systems have transitioned from basic rule-based models to Advanced Driver Assistance Systems (ADAS) and fully autonomous vehicles. Such systems necessitate vast amounts of data and an extensive array of computing resources (vCPUs and GPUs) for effective training.
This article discusses various development strategies, the functional components of ADAS, the design methodologies for a modular pipeline, and the obstacles faced during the construction of an ADAS system.
DNN Training Methods and Design
AV systems are primarily built on deep neural networks. When designing an AV system, two predominant approaches emerge, distinguished by their DNN training techniques and the system’s architecture.
- Modular Training – This strategy involves dividing the system into distinct functional units (such as perception, localization, prediction, and planning). This modular pipeline design is widely adopted among many AV system manufacturers, as it allows each module to be developed and trained independently.
- End-to-End Training – In this approach, a singular DNN model is trained, processing raw sensor data to generate driving commands. This monolithic structure is mainly pursued by researchers. The architecture often employs reinforcement learning (RL) based on a reward/penalty framework or imitation learning (IL) by observing human driving behaviors. While this method simplifies the architecture, it poses challenges in interpretation and diagnostics. Nevertheless, data annotation is economical since the system learns from human actions.
In addition to these methods, a hybrid approach is also being explored, where two distinct DNNs are interconnected through an intermediate representation.
This post elaborates on the functions grounded in a modular pipeline approach.
Automation Levels
The SAE International (previously known as the Society of Automotive Engineers) J3016 standard defines six tiers of driving automation, which is widely referenced in discussions about driving automation. These levels range from 0 (no automation) to 5 (full driving automation), as detailed in the table below.
Level | Name | Feature |
---|---|---|
0 | No Driving Automation | Human drives |
1 | Driving Assistance | Human drives |
2 | Partial Driving Automation | Human drives |
3 | Conditional Driving Automation | System drives with human as backup |
4 | High Driving Automation | System drives |
5 | Full Driving Automation | System drives |
Modular Functions
The diagram below illustrates a modular functions design.
At higher automation levels (Level 2 and above), the AD system executes multiple functions:
- Data Collection – The AV system continuously gathers data about its surroundings with high precision. Various sensors are employed, and their functions can overlap in numerous ways. As the AV field continues to evolve, there is still no consensus on the types and standards of sensors used. In addition to the devices listed, vehicles may use GPS for navigation and rely on maps and Inertial Measurement Units (IMUs) to gauge linear and angular acceleration. Depending on the ADAS implementation, the following devices may be included:
- Cameras – Visual devices akin to human perception, offering high resolution but limited depth estimation and performance in extreme weather.
- LiDAR – High-cost sensors that deliver a 3D point cloud of the environment, providing precise depth and speed estimation.
- Ultrasonics – Compact, cost-effective sensors effective only over short distances.
- Radar – Functional over both short and long ranges and effective in low visibility and harsh weather conditions.
- Data Fusion – The various sensors within the AV system generate signals that, while limited individually, collectively provide complementary insights. The AV systems integrate these signals to construct a comprehensive understanding of the environment, which is then used to train the DNN.
- Perception – The AV systems interpret the raw data from the sensors to build an understanding of the vehicle’s surroundings, identifying obstacles, traffic signs, and other relevant objects. This process, known as road scene perception, involves detecting and classifying items such as nearby vehicles, pedestrians, traffic lights, and signs. This function also assesses depth, lane detection, lane curvature, curb detection, and occlusion, all of which are vital for path planning and route optimization.
- Localization and Mapping – To safely operate and optimize vehicle navigation, AV systems need to comprehend the positioning of detected objects. They create a 3D map and continuously update the position of the ego vehicle and its environment. Advanced systems can even predict the movement dynamics of detected objects.
- Prediction – Using data from other modules, AV systems forecast how the environment will evolve in the near future. The onboard DNN predicts the position of the ego vehicle and interactions with surrounding objects by extrapolating kinematic states over time (position, velocity, acceleration, jerk). It can anticipate potential traffic violations, collisions, or near misses.
- Path Planning – This function outlines the potential routes the vehicle can pursue based on inputs from perception, localization, and prediction. To determine the optimal route, the AV system integrates localization, maps, GPS data, and predictions. Some systems create a bird’s-eye view by projecting the kinematics of the ego vehicle and other objects onto a static path to generate a 3D map. Others may also incorporate data from surrounding vehicles. Ultimately, the planning function seeks to identify the best route from all possible options to maximize driver comfort (e.g., favoring smooth turns over sharp ones, or a gradual deceleration instead of a sudden stop at stop signs).
- Control and Execution – This module translates the route planner’s output into actionable driving maneuvers, such as accelerating, decelerating, stopping, and steering. The aim of the controller is to adhere to the planned trajectory.
- Training Pipeline – DNNs that provide vehicle predictions must undergo training, typically conducted offline with data sourced from vehicles. This training demands thousands of computing units over extended periods. The data volume and computational power required depend on both the model architecture and the specific AV system provider. Training necessitates labeled data, which is partly manually annotated and partly generated through automation. Typically, personally identifiable information (PII), such as license plate numbers and faces, is anonymized. Many providers enhance their labeled datasets using simulation, enabling them to create data for specific scenarios. If you want to learn more about overcoming impostor syndrome, check out this webinar.
For more information on workforce development, you might find this article from SHRM helpful, as they are an authority on this topic. Additionally, if you’re interested in becoming a part of this field, consider visiting this resource.
Leave a Reply